598 research outputs found

    Matching Sequences under Deletion/Insertion Constraints

    Full text link

    Guided genome halving: hardness, heuristics and the history of the Hemiascomycetes

    Get PDF
    Motivation: Some present day species have incurred a whole genome doubling event in their evolutionary history, and this is reflected today in patterns of duplicated segments scattered throughout their chromosomes. These duplications may be used as data to ‘halve’ the genome, i.e. to reconstruct the ancestral genome at the moment of doubling, but the solution is often highly nonunique. To resolve this problem, we take account of outgroups, external reference genomes, to guide and narrow down the search

    Algorithmic and Hardness Results for the Colorful Components Problems

    Full text link
    In this paper we investigate the colorful components framework, motivated by applications emerging from comparative genomics. The general goal is to remove a collection of edges from an undirected vertex-colored graph GG such that in the resulting graph GG' all the connected components are colorful (i.e., any two vertices of the same color belong to different connected components). We want GG' to optimize an objective function, the selection of this function being specific to each problem in the framework. We analyze three objective functions, and thus, three different problems, which are believed to be relevant for the biological applications: minimizing the number of singleton vertices, maximizing the number of edges in the transitive closure, and minimizing the number of connected components. Our main result is a polynomial time algorithm for the first problem. This result disproves the conjecture of Zheng et al. that the problem is NP NP-hard (assuming PNPP \neq NP). Then, we show that the second problem is APX APX-hard, thus proving and strengthening the conjecture of Zheng et al. that the problem is NP NP-hard. Finally, we show that the third problem does not admit polynomial time approximation within a factor of V1/14ϵ|V|^{1/14 - \epsilon} for any ϵ>0\epsilon > 0, assuming PNPP \neq NP (or within a factor of V1/2ϵ|V|^{1/2 - \epsilon}, assuming ZPPNPZPP \neq NP).Comment: 18 pages, 3 figure

    Power Boosts for Cluster Tests

    Full text link
    Abstract. Gene cluster significance tests that are based on the num-ber of genes in a cluster in two genomes, and how compactly they are distributed, but not their order, may be made more powerful by the ad-dition of a test component that focuses solely on the similarity of the ordering of the common genes in the clusters in the two genomes. Here we suggest four such tests, compare them, and investigate one of them, the maximum adjacency disruption criterion, in some detail, analytically and through simulation.

    Parking functions, labeled trees and DCJ sorting scenarios

    Get PDF
    In genome rearrangement theory, one of the elusive questions raised in recent years is the enumeration of rearrangement scenarios between two genomes. This problem is related to the uniform generation of rearrangement scenarios, and the derivation of tests of statistical significance of the properties of these scenarios. Here we give an exact formula for the number of double-cut-and-join (DCJ) rearrangement scenarios of co-tailed genomes. We also construct effective bijections between the set of scenarios that sort a cycle and well studied combinatorial objects such as parking functions and labeled trees.Comment: 12 pages, 3 figure

    A framework for orthology assignment from gene rearrangement data

    Get PDF
    Abstract. Gene rearrangements have successfully been used in phylogenetic reconstruction and comparative genomics, but usually under the assumption that all genomes have the same gene content and that no gene is duplicated. While these assumptions allow one to work with organellar genomes, they are too restrictive when comparing nuclear genomes. The main challenge is how to deal with gene families, specifically, how to identify orthologs. While searching for orthologies is a common task in computational biology, it is usually done using sequence data. We approach that problem using gene rearrangement data, provide an optimization framework in which to phrase the problem, and present some preliminary theoretical results.

    On the PATHGROUPS approach to rapid small phylogeny

    Get PDF
    We present a data structure enabling rapid heuristic solution to the ancestral genome reconstruction problem for given phylogenies under genomic rearrangement metrics. The efficiency of the greedy algorithm is due to fast updating of the structure during run time and a simple priority scheme for choosing the next step. Since accuracy deteriorates for sets of highly divergent genomes, we investigate strategies for improving accuracy and expanding the range of data sets where accurate reconstructions can be expected. This includes a more refined priority system, and a two-step look-ahead, as well as iterative local improvements based on a the median version of the problem, incorporating simulated annealing. We apply this to a set of yeast genomes to corroborate a recent gene sequence-based phylogeny
    corecore